Efficient Entity Disambiguation via Similarity Hashing
نویسندگان
چکیده
The task of Named Entity Disambiguation (NED), which maps mentions of ambiguous names in natural language onto a set of known entities, has been an important issue in many areas including machine translation and information extraction. Working with a huge amount of data (e.g. more than three million entities in Yago), some parts in an NED system which estimate the probability of a mention matching an entity, the similarity between a mention and an entity and the coherence among entity candidates for all mentions together might become bottlenecks. Thus, it is challenging for an interactive NED system to reach not only high accuracy but also efficiency. This thesis presents an efficient way of disambiguating named entities by similarity hashing. Our framework is integrated with AIDA which is an on-line tool for entity detection and disambiguation developed at Max-Planck Institute for Informatics. We apply various state-of-the-art approaches, for example Locality Sensitive Hashing (LSH) and Spectral Hashing, to some forms of similarity search problem such as near-duplicate search for mention-entity matching, and especially related pair detection for entity-entity mapping which is not the default application of using hashing techniques due to the usually low similarities between entities.
منابع مشابه
Trading accuracy for faster entity linking
Named entity linking (NEL) can be applied to documents such as financial reports, web pages and news articles, but state of the art disambiguation techniques are currently too slow for web-scale applications because of a high complexity with respect to the number of candidates. In this paper, we accelerate NEL by taking two successful disambiguation features (popularity and context comparabilit...
متن کاملFICO: Web Person Disambiguation Via Weighted Similarity of Entity Contexts
Entity disambiguation resolves the manyto-many correspondence between mentions of entities in text and unique real-world entities. Fair Isaac’s entity disambiguation uses language-independent entity context to agglomeratively resolve mentions with similar names to unique entities. This paper describes Fair Isaac’s automatic entity disambiguation capability and assesses its performance on the Se...
متن کاملNamed Entity Linking Based On Wikipedia
In this paper, we present the ideas and methodologies on labeling the mentioned entities with the wiki dataset. This paper presents a system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection from Wikipedia. We focus on maximizing the similarity between the contextual information extracted from Wikipedia and the ...
متن کاملEntity Disambiguation with Linkless Knowledge Bases
Named Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics to plain text and distinguish homonymous entities. Previous research has tackled this problem by making use of two types of context-aware features derived f...
متن کاملUnsupervised Name Disambiguation via Social Network Similarity∗
Though names reference actual entities it is nontrivial to resolve which entity a particular name observation represents. Even when names are devoid of typographical error, the resolution process is confounded by both ambiguity, where the same name correctly references multiple entities, and by variation, when an entity is correctly referenced by multiple names. Thus, before link analysis for s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012